EPJ manuscript No. 

(will be inserted by the editor) 



Selective advantage of topological disorder in biological evolution 

Michal Kolar 1,2a and Frantisek Slanina lb 

1 Institute of Physics, Academy of Sciences of the Czech Republic, Na Slovance 2, CZ-18221 Praha, Czech Republic 

2 Institute of Physics of Charles University in Prague, Ke Karlovu 5, CZ-12116 Praha, Czech Republic 

(This version was processed on: February 1, 2008) 

Abstract. We examine a model of biological evolution of Eigen's quasispecies in a so-called holey fitness 
landscape, where the fitness of a site is either (lethal site) or a uniform positive constant (viable site). The 
evolution dynamics is therefore determined by the topology of the genome space which is modelled by the 
random Bethe lattice. We use the effective medium and single-defect approximations to find the criteria 
under which the localized quasispecies cloud is created. We find that shorter genomes, which are more 
robust to random mutations than average, represent a selective advantage which we call "topological". A 
way of assessing empirically the relative importance of reproductive success and topological advantage is 
suggested. 

PACS. 05.40.-a Fluctuation phenomena, random processes, noise, and Brownian motion - 87. 23. Kg Dy- 
namics of evolution - 72.15.Rn Localization effects (Anderson or weak localization) 



1 Introduction 

The mechanism of biological evolution is a very challeng- 
ing topic for the physical community. This is well ex- 
pressed in the numerous models of biological evolution 
that have emerged in recent years. The list of models 
| studied starts with large-scale properties of the evolution- 

■ ary process — massive global extinctions — and ends with 
the works which are aimed at following the replicative 
behavior of the chemical structures holding information, 
namely DNA molecules. A lot of effort has been devoted 

I to this area recently piOI51llU5UM7ll51l^llTUlFrTlll^lli3lll4lll51 

■ ITtJ] and still new fruitful ideas emerge |17II18II19| . 

The process of biological evolution consists in three 
steps: reproduction —> mutation — * selection. The im- 
portant thing to note is that the biological fitness function 
which denotes an individuals' ability to produce viable off- 
springs depends on their phenotype. On the other hand, 
' the mutations occur in the genotype — information stored 
in a sequence of DNA. The fitness function, which assigns 
to each microscopic genotype its ability to reproduce and 
pass on to the next generation, shows an overwhelming 
complexity. This property makes the theoretical treatment 
of evolution an extremely complicated task. 

Simplifications of the problem are necessary. The fruit- 
ful idea of adaptive landscapes was introduced by Wright 
|2*U) and was later further simplified to fitness landscapes 
later. In this scheme the individual is represented as a 
point in a multidimensional space and to every point a 



fitness value is assigned. Thus, a landscape is formed of 
mountains of genetically adapted positions and valleys of 
lethal genomes. Note the idealization: the fitness is di- 
rectly given by the genome of the individual, not by its 
(extended) phenotype. 

In fact, the fitness landscape is not static since it strongly 
depends on the ever changing environment, which includes 
interactions with (co)evolving species as well as abiotic in- 
fluences 21,22,23 . Evolutionary process manifests itself 
the ascent of individuals to peaks on the fitness landscape. 

A broad set of different fitness landscapes was recently 
used to study the behavior of the evolutionary system. 
These models employ both static |24H25| and dynamic 
landscapes 23,26,27, 28] . They include the sharply-peaked 
landscape (SPL) with a single preferred genome (wild- 
type), the Fujiyama landscape, or the holey landscape (HL). 
Several recent reviews summarize various approaches ex- 

Generally, one simplifies the genetic code considering 
only a two-letter {0, 1} alphabet. Let us consider, for ex- 
ample, the digits 1 and as symbols for two different alle- 
les of a certain gene 1 . Assuming constant genome length 
of d loci the state space is a hypercube Q = {0, l} d . 

The dynamics of evolution on fitness landscapes have 
the most prominent scheme in the quasispecies model as 
introduced by Eigen |32| . Originally it was introduced as 
a model for chemical prebiotic evolution, but it showed 
itself plausible in the investigation of the mechanisms of 
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microevolution of viruses and bacteria, i.e. organisms with 
relatively simple genomes. 

The quasispecies is defined as a cloud of closely related 
individuals. They hang together but certainly they may 
move on the fitness landscape. We assume an infinite pop- 
ulation size and therefore the dynamics of the quasispecies 
are probabilistic, but without any noise which would be 
induced by finite size effects. The quasispecies obeys three 
basic processes: reproduction, selection and is liable to mu- 
tational genetic changes. Reproduction and selection are 
treated together — the reproductive ability depends on the 
fitness and hence selection takes part here. Mutation can 
occur in the individual's genome with a probability rate 
jti. 

It is evident that the native geometry of the genome 
space is the above mentioned hypercube Q = {0, l} d . So, 
this is the natural starting point when building a model 
for a fitness landscape. Recently, the main focus has been 
aimed at the presence or absence of the adaptive regime 
induced by a single maximum in the fitness landscape. 
Therefore, the first thing to try is the sharply-peaked land- 
scape on the hypercube. This model was solved exactly by 
Galluccio et al. |35I33| . 

The most important feature found in the quasispecies 
model in SPL is the error threshold |'S5U.'S6 | . It is the phase 
transition that separates two regimes of the quasispecies 
evolution, namely the adaptive regime and the wandering 
regime. In the adaptive regime the localized cloud of qua- 
sispecies is formed around the wildtype (for us a certain 
site in the lattice), whereas in the wandering regime no 
quasispecies is formed due to a mutation load. 

The error threshold phenomenon comes out of the com- 
petition between the selective advantage of a certain wild- 
type and the mutation load presented by the rate fi. The 
threshold is then characterized by the specific value of 
the selective advantage. We want to show that the selec- 
tive advantage in the specific site is not the only parame- 
ter, whose value can distinguish two significantly different 
regimes. There are also geometrical properties, e.g. con- 
nectivity of the site, that can make the genotype in the 
specific site advantageous. These are the main tasks of this 
article. 

Indeed, the real landscapes are far more complicated 
than SPL. It is expected that rugged landscapes with 
many competing maxima represent a realistic picture pQ. 
Such landscapes are well-known in the theory of spin glass- 
es [37| and a "spin-glass" theory of evolution was investi- 
gated, e.g. in In our previous work we have studied 
several similar models of the fitness landscape, too |39|. 

Yet another approach to the modelling of the fitness 
landscape is used for computations |4(JU41| — the so-called 
holey landscape (HL), where the fitness is either a positive 
constant (which may be set to I) or it is which means 
that the individual with the corresponding genetic code 
dies with a probability of I without having offspring. 

Indeed, a large part of the point mutations which may 
occur at the basic level of the evolutionary picture are 
lethal for the individual — they lead to the 'lethal' sites. 
Therefore, the hypercube does not represent a good ap- 



proximation to the evolutionary dynamics, because only 
a small part of its edges represent paths to possible new 
'hospitable' genomes. 

A sparsely connected set of points selected at some 
of the hypercube corners is perhaps a better choice. Such 
sparse graphs and their adjacency matrices are under study 
extensively at present; see e.g. [4211431144) and references 
therein. The authors observe the localization of the eigen- 
vectors of the sparse matrices due to topological charac- 
teristics only 

We approach the problem from a somewhat different 
perspective which resembles the study of topologically dis- 
ordered solids where the random network of bonds is often 
well modelled by the Bethe lattice 05] ■ This, of course, 
supposes that there are no short loops in the graph. Sup- 
posing we are above the percolation threshold, our random 
lattice forms a giant cluster within the hypercube and the 
typical length of loops is in the number of sites N = 2 d of 
order log TV, where N is the total number of sites N = 2 d , 
see ■ Thus, the Bethe lattice may be a good model for 
the topology of our sparse graph. 

The main question addressed in our work will be the 
following: Selective advantage in biological evolution is 
usually attributed to higher reproductive success. If ad- 
vantage due to high individual fitness exceeds a certain 
threshold, a quasispecies is formed around the site. This 
is the usual error-threshold phenomenon. Here we ask, 
whether some factors related purely to the structure or 
topology of the genome space may lead to similar selec- 
tive advantage, and if a certain threshold can be found 
separating the adaptive and wandering regime of biologi- 
cal evolution. 



2 Model 

Recently, we have modelled evolution in a holey land- 
scape using the regular Bethe lattice. Now we will try 
to represent it in a more precise way and take into ac- 
count the irregularity of the lattice. For each site we select 
whether the links that leave it lead to another 'hospitable' 
site of the hypercube, or not. In the latter case, there are 
some links leading from the site to the 'lethal' sites of the 
hypercube and so all mutants that took this direction are 
doomed. 

We start with a regular Bethe lattice with a connec- 
tivity k. We construct the irregular Bethe lattice by ran- 
domly assigning lethal sites and removing all sites which 
are connected to the rest of the lattice only through a 
lethal site. The evolution process amounts to diffusion on 
this lattice. Hops from a viable site to any of its neighbors 
occur with an equal rate ji. Hops from lethal sites are pro- 
hibited. So, the edges to lethal sites are "dead ends" or 
"dangling bonds", in the language of condensed matter 
physics. 

Let i be a viable site. Let us denote (i) as the set of 
viable neighbors of i. Ki = < k is the number of those 
viable neighbors. For k we suppose only that it is large 
enough to fulfill the previous inequality. The arbitrariness 
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of the choice of k will be clarified later on. Intuitively, dur- 
ing evolution the probability will flow out from i to all its 
k neighbors, but flow in only from m viable ones. There- 
fore, we expect that a site with larger re, is more likely 
to gather individuals and form a quasispecies cloud. The 
scope of this paper is to elaborate this intuitive picture in 
a more formal manner. 

If we denote Pi(t) as a (relative) population 2 of site i 
at time t, we can write the following master equation 

p i {t) = Y J T ij P j {t) (1) 

3 

where the matrix T contains the effects of mutations and 
reproduction: 

{-fik + C i=j 
A* 3 e (*) (2) 
elsewhere. 

The constant £ is introduced artificially in order to keep 
the total population constant. In fact, there are two causes 
of the net population outflow which must be countered by 
the £ term. The first one is the presence of traps — lethal 
sites that absorb the individuals. The second one comes 
from the very topology of the Bethe lattice. Indeed, any 
finite Bethe lattice has the rather counter-intuitive prop- 
erty that the number of the surface sites is comparable to 
the number of bulk sites and this property holds even in 
the limit of an infinite number of sites. Therefore, there is 
always a net flow of probability toward the surface. How- 
ever, we are interested in the properties of the sites deep 
in the bulk and the outflow toward the surface should be 
considered as an artifact of the Bethe lattice approxima- 
tion. To sum up, we will fix the value of £ later on in the 
calculations, in order that the population remains fixed. 

The dynamical matrix T is in the model symmetric, 
and this enables us to use the method of the resolvent in 
our calculations. The symmetry results from the assump- 
tion of equal mutation rates, \x. This widely used approx- 
imation simplifies all the following calculations and, since 
we are interested only in a stationary state of the system, 
it is not considered to change the main results. One can 
introduce the edge-dependent rates fiij and then £ would 
become a function of all pi (t) and Eq. (JJJ would generally 
become non-linear one. But this approach goes beyond the 
scope of our article. 



cloud, through the properties of the resolvent of the ma- 
trix T 

G(z)^(z-T)~ 1 . (3) 

The idea of the calculation is quite simple: In the long- 
time limit only the largest eigenvalue of the matrix T sur- 
vives, and the corresponding eigenvector describes the sta- 
tionary state of the evolutionary system. 

In order to find the largest eigenvalue of T, we search 
for the poles of an element of the resolvent matrix Q. These 
are exactly the eigenvalues of T, no matter which clement 
of Q we choose. And since we want to observe the forma- 
tion of the localized state around a specific site i = 1, we 
need the diagonal matrix element of the resolvent 

G{z) = [Q{z)]n (4) 

which can be calculated using the partitioning (projector) 
method [17|. explained in more detail in The loop- 
less structure of the Bethe lattice greatly simplifies the 
treatment. We proceed essentially in two steps, which are 
illustrated in the Fig. ^ In the first step, we project out 
of the site i = 1 itself. The rest of the Bethe lattice splits 
into disconnected branches. We find 

G[z) = z + M-C-M 2 £ je( i)^)' (5) 

where is the diagonal element of the projected resol- 

vent on the terminal site of the branch starting with site 
j. (The terminal sites are denoted as 2 in Fig. Q]) 




Fig. 1. Division of the Bethe lattice during the partitioning. 



3 Partitioning 

As in the previous paper |39| we investigate the forma- 
tion of a localized state, now interpreted as a quasispecies 

2 By "population" we mean the infinite population limit, the 
probability of finding a given individual in the site i of the 
lattice. We prefer the term "relative" population because of 
the perturbations that will be added to the lattice and will 
destroy the conservation of the overall relative population. 



In the second step, we calculate rj{z) by projecting 
out the site j. We find, similar to the previous case, 

Fj {z) = * + ^-C-^£, ea)U1} /H*) ' (6) 

Some of the sites I € (j)\{l} are denoted by 3 in Fig.^ 
Iterating the equation @ we could in principle calculate 
the resolvents on the terminal sites and inserting them 
into Eq. J^} obtain the desired quantity. 
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This procedure works well in an infinite regular Bethe 
lattice, where Pj(z) for j deep in the bulk does not depend 
on the site index j and © represents in fact a closed 
equation for r(z) = Pj[z) Vj. However, this procedure 
cannot be directly applied in the case of an irregular Bethe 
lattice. Nevertheless, it is a good starting point for an 
approximation we will describe in the following. 



4 Effective medium approximation 

The mean-field-type treatment of disordered solids was 
developed a long time ago within the coherent potential 
approximation (CPA) In the theory of sparse ran- 
dom matrices it was elaborated using the replica method 
42 43, 44 and called the effective medium approximation 
(EMA) . In this section we will use the EMA for the irreg- 
ular Bethe lattice without using the replica trick. 

The main idea relies on a simple observation that the 
sum X)ie(j)\{i} Fi{z) containing Kj — 1 terms can be re- 
placed by its average value for large Kj, thus neglecting 
fluctuations. Indeed, it was proved that the CPA is exact 
in the limit of infinite connectivity • Therefore, our 
version of the EMA amounts to approximating 



E r <w 



- 1) (r(z)) . 



(7) 



In order to close the equations, we must average expression 
jnjl over the probability distribution of connectivities P (re) 



im) = E 



P(re) 



z + ^fc-C-M 2 (re-l)(r(z)} 



(8) 



The latter equation iJSJ is the core of the EMA. When 
we insert its solution (r(z)) into JSJ and perform again 
the average over connectivities, we obtain the averaged 
diagonal element of the resolvent (G(z)), which is now 
site- independent. The parameter £ represents the shift in 
the variable z and should be adjusted so that the upper 
edge of the support of the imaginary part of (G(z)) (i.e. 
the density of states) lies at z = 0. This expresses the 
requirement of the conservation of the population size. 

As an illustration we show in Fig.[2the real and imagi- 
nary parts of (r(z)) for the connectivity distribution cho- 



P(k) =pS(k-2) + (1-p)S(k-3) 



(9) 



for < p < 1. We can see that the imaginary part of 
{P(z)) approaches zero as z 1 / 2 at the band edge and the 
real part approaches a finite limit. From the technical 
point of view, the finiteness of the limit is the source of 
the transition between localized and delocalized states. 



5 Single defect 

So far we have investigated the Bethe lattice as an aver- 
aged homogeneous effective medium. Now we investigate 




Fig. 2. The averaged resolvent at terminal site (-T(z)} for the 
irregular Bethe lattice with connectivity distribution defined 
in JUJ, where p = s. Full line: imaginary part. Dashed line: 
real part. 




Fig. 3. The "defect" site with a different connectivity. Thin 
links represent lethal mutations, the thick ones mutations to 
viable sites. 



the behavior of the resolvent at a site i = 1 with a specific 
connectivity rei. The situation is schematically depicted in 
Fig. El 

This approach has a biological motivation. What we 
see in nature is that there are frequent groups of closely 
related animals. These groups form taxonomic classes or, 
better said, the classes are defined as the groups of such 
closely related individuals. In the following, we examine if 
the clustering of individuals, modelled as the addition of 
the site with largest connectivity, leads to some observable 
changes in the quasispecies evolutionary process. 

The on-site resolvent corresponding to this site can be 
found from Eq. where the resolvents at the terminal 
sites are approximated as in (J7J) . This is the essence of the 
single-defect approximation (SDA). Hence 



Gsda(z) = 



1 



(10) 



+ /ifc-C-/i 2 rei(r(z)) ■ 

A state localized at this site exists, if the resolvent Gsda(z) 
has a pole for a real positive z. This is equivalent to the 
condition 



rei > re c = 



_ M - c 



[l 2 Re(r(0)) 



(11) 
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Therefore, the state is localized and the quasispecies cloud 
is formed for integer k\ greater than k c . This corresponds 
to the adaptive regime of the evolutionary process. We 
use as an example again the distribution IgJJ and show 
in the Fig. 0] the dependence of the threshold n c on the 
parameter p of the probability distribution. 



3.5 - 



3 - 



2.5 



adaptive regime 



wandering regime 



0.2 



0.4 0.6 

V 



0.8 



Fig. 4. The dependence of the critical value of the connectiv- 
ity, K c , on p, for connectivity distribution @. 



We will consider yet another type of "defect" in the 
effective medium. Imagine that the site investigated, i = 1, 
corresponds to a genome of length different to the average. 
This difference is translated in our model as a choice of 
different total connectivity k in the site i (including edges 
to both viable and lethal sites). So, we may replace k — > 
k + A in Eq. 11U|) . Then we may investigate the transition 
from the wandering to the adaptive regime when varying 
two parameters A and K\. We will see that the individuals 
added due to the £ term in Eq. Q are redistributed in 
the lattice in such a manner that the total population 
changes. This is only due to addition of the "defect" and 
the localization of individuals in its vicinity. The condition 
for the existence of a localized state, i.e. for the adaptive 
regime, is analogous to and can be written as 



A < A c ee fiKt Re(r(0)) - 



M - c 



(12) 



We show in Fig. an example of the dependence of A c 
for the same connectivity distribution as in previous 
cases, and for a specific, fixed choice of «i. 



6 Conclusions 

We modelled the complex fitness landscape of biological 
evolution with an irregular Bethe lattice. The formation 
of localized quasispecies in the adaptive regime was ob- 
served via the occurrence of the isolated pole in the on- 
site resolvent. Using the effective medium approximation 
we calculated the disorder-averaged resolvent and within 
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Fig. 5. The dependence of the critical value A c on the prob- 
ability p, for distribution © and ki = 5. 



the single-defect approximation we investigated the local- 
ization. 

We found that the quasispecies is formed if the site 
is connected to a sufficiently large number of viable sites. 
We found the condition for the critical value of the connec- 
tivity k c separating the adaptive evolutionary regime for 
k > n c and the wandering regime for n < k c . As expected, 
k c grows with the average connectivity and qualitatively 
speaking the quasispecies are formed at such sites that 
have sufficiently larger number of viable neighbors than 
average. We may interpret it as a purely topological selec- 
tive advantage: The species has better chance to survive 
not because of its individual reproductive abilities, but 
because it is less vulnerable to random alterations of the 
genetic code. 

We investigated also another deviation from the mean, 
connected to the overall number of neighbors. In more bio- 
logical language it corresponds to the length of the genetic 
code. We observed that the adaptive regime is favored for 
lower values of the connectivity, i.e. for shorter genomes, 
if the number of viable neighbour sites is considered con- 
stant. This type of selective advantage is therefore also of 
topological origin and has a similar biological interpreta- 
tion to that presented above. Indeed, longer genome means 
higher probability of a lethal mutation. 

In both cases it is important to note that it is the 
deviation from the average topology which makes the se- 
lective advantage work and which leads to the formation 
of localized states. So, it is the (sufficiently strong) topo- 
logical disorder that is responsible for the formation of 
quasispecies. 

We may conclude by summarizing that without resort 
to individual reproductive capacities, biological evolution 
favors genomes which are shorter and more robust to ran- 
dom mutations. This has one more important implication; 
a genome, which can be easily mutated without affecting 
the death of its carrier, means also a less-defined species. 
One may therefore predict that successful species will exist 
in a broad variety of slightly different sub-species. This ef- 
fect is caused by the selective advantage of certain topolo- 



6 



Michal Kolaf, Frantisek Slanina: Selective advantage of topological disorder in biological evolution 



gies of the genome space. Observation of the variability 
within a single species may therefore say something of the 
relative importance of topological selective advantage in 
comparison to individual reproductive success. 

The authors would like to thank to Jan Masek for critical read- 
ing of the manuscript. Michal Kolaf would like to thank to 
Vladislav Capek for fruitful discussions and to Anton Markos 
for helpful discussions on the topic of biological evolution, too. 
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